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RECURSIVE PARTITION STRUCTURES 

By Alexander V. Gnedin and Yuri Yakubovich 1 

Utrecht University 

A class of random discrete distributions P is introduced by means 
of a recursive splitting of unity. Assuming supercritical branching, we 
show that for partitions induced by sampling from such P a power 
growth of the number of blocks is typical. Some known and some 
new partition structures appear when P is induced by a Dirichlet 
splitting. 

1. Introduction. By a random discrete distribution (or a paintbox) we 
shall understand an infinite collection P = (Pj) of nonnegative random vari- 
ables whose sum is unity. Interpreting the terms of P as frequencies of 
distinct colors, Kingman's paintbox construction [16] defines a random ex- 
changeable partition V of an infinite set of balls labeled 1,2,... in such a 
way that, conditionally given (Pj), the generic ball n is painted color j with 
probability Pj, independently of all other balls. The blocks of V are com- 
posed of balls painted the same color. Two paintboxes which only differ by 
the arrangement of terms in a sequence yield the same V; hence to maintain 
symmetry we may identify the paintboxes with the point process J2j $Pj- 
See [3, 22] for extensive background on exchangeable partitions. 

Let K nr be the number of colors represented exactly r times on n first 
balls, and let K n be the total number of different colors represented on 
n first balls, so that ^ r K nr = K n , ^ r rK nr = n. The sequence of joint 
distributions of (K n \, . . . , K nn ) for n = 1, 2, . . . is a partition structure, that 
is, a consistent family of distributions on partitions [16]. We are interested 
in the asymptotic features of K n and K n rS, as n — > oo, for one particular 
class of models for P. 

The functionals K n and K nr have been studied in some depth for several 
families of random discrete distributions. The best known instance is the 
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Poisson-Dirichlet/GEM paintbox, which induces V called the Ewens par- 
tition. For the Ewens partition K n is approximately Gaussian with the mean 
and the variance both growing logarithmically, and the sequence (K n \, K n 2, ■ ■ ■) 
converges, as n — > oo, to a sequence of independent Poisson variates [1]. The 
GEM realization of the Poisson-Dirichlet paintbox amounts to the stick- 
breaking representation Pj = W\ ■ ■ ■ Wj-\(1 — Wj) (j = 1,2,...) with inde- 
pendent Wj's distributed according to beta(#,l). A larger class of models 
for P of this type, with arbitrary independent identically distributed factors 
Wj S [0,1], was studied in [6], where it was shown that, under very mild 
assumptions on the distribution of WVs, the behavior of K n is analogous to 
that in the Ewens case. 

Each random discrete distribution P resulting from the stick-breaking can 
be viewed as a collection of j ump sizes of the process (exp (— St) , t > 0) , where 
(St) is a compound Poisson process. A considerable extension of this scheme 
(see [7, 8]) appears when we assume (St) to be a subordinator with some 
infinite Levy measure v. It is known that the orders of growth of K n and 
K nr 's are determined then by the behavior of the tail i^[x,oo[ for x j 0. 
Specifically, if the tail behaves like x a £(l/x) with < a < 1 and I a function 
of slow variation at infinity, then the order of growth of K n and all K n rS 
is n a £(n), and with this scaling K n and each K nr converge, almost surely, 
to constant multiples of the same random variable [10]. A distinguished 
example of the latter situation is the Poisson-Dirichlet paintbox [23] with 
two parameters < a < 1 and 9 > —a, which induces the Ewens-Pitman 
partition structure whose distribution is given by the formula 



where (a)^ = a(a + 1) . . . (a + i — 1) stands for rising factorials, and i = 
ki + ■ ■ • + k n . A construction of this partition structure via exp (—St) is 
given in [7], and in Section 6.1 we briefly recall the original construction [20] 
of the two-parameter paintbox. Very different asymptotic behavior appears 
when the tail z;[x,oo[ is slowly varying at zero like, for example, for gamma 
subordinators: in this case the moments of K n and K nr J s are slowly varying 
functions of n, all K nr J s grow on the same scale but slower than K n and, 
subject to a suitable normalization, K n is asymptotically Gaussian [2, 11]. 

The stick-breaking model for P is the simplest instance of a recursive 
construction in the sense of the present paper. By stick-breaking the unity 
splits in two pieces 1 — W\ and W±, the first piece becomes a term of P, 
and the second keeps on dividing by the same rule, hence producing again 
a term for P and a piece which divides further, and so on. The paintbox as- 
sociated with exp(— St), for (St) a subordinator with infinite Levy measure, 



F[K nl = h,..., 
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can be also realized via a recursive construction which produces, at each 
step, infinitely many terms for P and exactly one piece to undergo further 
splitting. 

In this paper, the principal step away from the models of the stick- 
breaking type is that we deal with the recursive models for P in which 
a random splitting of unity involves some branching. We assume an induc- 
tive procedure in which each step yields a collection of terms included in P, 
and a multitude of divisible pieces to iterate a random splitting rule. A typ- 
ical example of our class of models is the paintbox arising by the following 
construction of a random Cantor set 1Z (see [18] for the general theory of 
recursive constructions of this kind). Start with dividing the unit interval 
in three intervals of sizes Xi,Y2,X$, from the left to the right, as obtained 
by cutting [0, 1] at the locations of two uniform order statistics. Remove 
the middle interval and iterate the operation of cutting and removing the 
middle independently on two other intervals (considered as scaled copies of 
[0, 1]), then iterate on four intervals, and so on. A random set 1Z of Lebesgue 
measure zero is defined as the complement to the union of all removed in- 
tervals, and the collection of lengths of the removed intervals arranged in 
some sequence defines a paintbox P. It will follow from the main result of 
this paper (Theorem 5) that K n and each K nr grow for this P like n a * with 
exponent a* = (y/V7 — 3)/2, which is equal to the Malthusian parameter of 
a related branching process and is also equal to the Hausdorff dimension of 
K [18, 19]. 

In wider terms, our construction is described as follows. At step one the 
unity is randomly divided in some collection of solids and some collection 
of crumbs. The solids immediately suspend further transformation while the 
crumbs keep on falling apart. At step two the crumbs are split further by 
the same random rule, the newly created solids become indivisible and the 
crumbs are subject to further division, and so on. Eventually, the crumbs 
decompose completely in solids, and the sizes of solids (arranged in some 
sequence) comprise the paintbox P. 

We will show that the power growth of K n and K nr J s is quite common 
for V derived from such a recursive paintbox with supercritical branching. 
Moreover, by the power scaling K n and K nr 's all converge to constant multi- 
ples of the same random variable M which can be characterized in terms of a 
distributional fixed-point equation. Some explicit moment computations for 
M are possible for instances of the splitting procedure based on the Dirichlet 
distribution; for some choices of the parameters these yield the paintboxes of 
the Poisson-Dirichlet (a, a/d)-type (d = 1, 2, . . .) and for some other choices 
yield new paintboxes hence novel partition structures. 

2. Malthusian hypothesis and a martingale. Let (X,Y) = ((Xi), (Yj)) 
be two sequences of random variables with values in [0,1]. To introduce a 
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genealogical structure of the division process it will be convenient to assume 
that the sequences are labeled by two disjoint subsets of N. We further 
require that 



(1) ^Xt + ^Y^l, 



E 



<1, E[#{i:Xi>0}]>l. 



The division process starts with a sole unit crumb £ (generation 0) which 
produces the first generation of crumbs and solids (77^ ) whose joint law 
and labels are the same as for (X, Y). Inductively, the offspring of generation 
k — 1 are crumbs (Ch,...,ik-i,i) anc ^ son ds (r]i 1 ,...,i k _ 1 ,j)- The solids stop division, 



while each crumb ^ i k splits further into crumbs (£ix,...,i fc ,i) and solids 
{Vh,...,ik,j) whose labeling and sizes relative to the parent crumb follow the 
law of (X, Y) , independently of the history and the sizes of other members of 
the current generation. The first two assumptions in (1) guarantee that the 
total size of solids over all generations is unity, hence these sizes (arranged 
in a sequence) define a paintbox P. The third assumption in (1) says that 
the branching of crumbs is supercritical. 

Introduce the intensity measures a and v by requiring the equalities 



E 



E 



£/(**: 

i 



f(x)a(dx) 



f(x)u(dx) 



to hold for all nonnegative measurable functions /. Substituting power func- 



tions f(x) 
sures 



in these formulas yields the Mellin transforms of the mea- 



x a cr(dx) 



<p(a) := / x a v(dx). 
J 

Recall that, as a function of complex parameter, the Mellin transform of a 
measure on [0, 1] is analytical in the half-plane to the right of the convergence 
abscissa, has a ridge on Ima = and decreases on the real half-line. 

The Malthusian hypothesis accepted in this paper amounts to the assump- 
tions that: 

• there exists a solution a* to the equation 
(2) V(a) = l 

(which satisfies then a* s]0, 1[ since ^(1) < 1 < tp(0) by (1)), 
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• there exists e > such that a* is a unique solution to (2) in the half-plane 

{a : Re a > a* — e} and <£>(a* — e) < oo. 

Following the established tradition in the theory of branching processes 
we call a* the Malthusian exponent. Obvious sufficient conditions for the 
Malthusian hypothesis are tp(0) < oo and f(0) < oo. Note also that the sec- 
ond part of the Malthusian hypothesis implies that a is not supported by a 
geometric progression, since otherwise (2) would have infinitely many peri- 
odically spaced roots on the line Re a = a*. 

Summing the a*th powers of crumbs in a given generation yields a re- 
markable process called the intrinsic martingale [13] 

">■■■■ E C,., ik - *=i,2,..., 

n,...,i k 

which, under the Malthusian hypothesis, converges to a terminal value 

M := lim M k 

k— >oo 

with E[M] = 1; see [17]. The limit variable satisfies the distributional fixed- 
point equation 

(3) m=J2k* m(1 \ 

i 

where M" are independent copies of M, independent of X. It is known that 

(3) along with E[M] = 1 uniquely characterizes M [12], Proposition 3(a). 

3. The mean values of counts. Consider the powered sums of sizes of all 
solids that make up the paintbox 

oo 

G a := Pj* = E E Ci-iife-ij 

j k=l «,..., ,j 

and let 

p(a) :=E[G a ]. 

For integer arguments the value p(n) is the probability that n balls are 
painted the same color. The first-split decomposition of the division process 
yields the distributional equation 

(4) G a ^XfGf+£y/*, 

i j 

(i) 

where Ga are independent copies of G a which are also independent of 
(X, Y). Taking the expectations this implies 
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By the Malthusian hypothesis the expectations involved are finite, and the 
function p is meromorphic in the half-plane Re a > a* — e, with a sole simple 
pole at a*. These analytic properties of p provide a background for estab- 
lishing the growth properties for the mean values of counts K n and K nr . 

Conditionally given (Pj) the probability that at least one of n balls is 
painted color j is 1 — (1 — Pj) n , hence recalling the definition of p 



(6) 



E[K n ]=E 



Ea-a-^- 



E(;;)(-ir + v-). 

m=l V 7 



In a similar way, computing the chance that color j is represented exactly r 
times in n balls: 



E[K n 



E 



(7) 



E 



(-l) m p(m + r). 



n—r 

n \ v I n — r 



r j e -~ t V m 

m=0 



Theorem 1. Under assumption (1) and the Malthusian hypothesis, the 
following asymptotics hold: 

(8) E[K n }=n a * r{ ~ a ;^[ a * ) +0(n a *- £ ) asn^oo, 

W ("*) 

(9) E[K nr ] = n a * - a *^^ + (n a *- £ ) asn^oo. 

—nip '(a*) 

Proof. Because the function p is bounded in the half-plane Re a > 
a* — e, outside any neighborhood of a*, and because p has a simple pole at 
a*, we can apply the Rice method (see [5], Theorem 2 (ii) ) to the alternating 
sum (6) to obtain 



E 



;(^)(-ir» 



(10) 

_ . . r(i — a)r(n + 1) „ _,. 

= Resp(a)^— — — - 1 + 0(n a * e ). 

a = a * F{ 1 aT(n + l-a) v ; 

The residue at a* is equal to — </?(a* (a*), which taken together with 
T(n + a)/T(n) ~ n a readily yields (8). The result for K nr can be obtained 
in the same way. Alternatively, observe that the sum in (7) is asymptotic to 
a constant multiple of the rth derivative of (6) in the variable n, hence (9) 
follows from (8) by a Tauberian argument. □ 
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The first-split decomposition shows that the number of colors K n satisfies 
a divide-and-conquer recurrence of the form 

i 

where (Kn\n = 1,2, . . .) are independent copies of (K n ), and the joint law 
of (A n i,B n )i>i follows by considering the partition of n induced by the 
joint paintbox (X\,X2, ■■■] (YljYj))- ^ n other words, given (X, Y) each of n 
balls is painted color i with probability Xi and left uncolored with proba- 
bility J2j Yj then A n i is the number of balls painted color i and B n is the 
number of uncolored balls. Because (3) is a limit analogue of this equation, 
the contraction method [25] can be exploited to show weak convergence of 
scaled K n . To argue the strong convergence we will apply an indirect ap- 
proach (also used in [10]) which relates the growth properties of K n ,K nr 
with the sizes of solids. Let 

JV x :=#0':Pi>x} 
be the number of solids with size at least x. 

Lemma 2. If the paintbox satisfies N x ~ Lx~ a a.s. as x [ 0, with < 
a < 1 and L a nonnegative random variable, then for n — > oo 

K n /n a — > r(l — a)L a.s. 

and 

K nr /n a — > (oT(r — a)/r\)L a.s. 

Proof. Conditioning on the paintbox (Pj), the value of L is fixed, hence 
we are in the range of applicability of Karlin's result; see [15], Theorem 1, 
equation (23) and page 396. From this it is obvious that the claim holds 
unconditionally. □ 

4. The limit distribution. To determine the limiting behavior of x a *N x 
as x — > we shall connect the recursive paintbox construction to a general 
Crump-Mode-Jagers (CMJ) branching process [13]. The idea is to map the 
sizes of crumbs into a continuous time scale. 

The setup for a CMJ branching process involves the random data (tt,x) 
with 7r a prototypical point process on M + according to which descendants 
are born, and (x(t),t G R) a process called characteristic (or a score of indi- 
vidual), which is nonnegative and satisfies x(t) = f° r t<0. The branching 
process starts at time with a single progenitor which produces offsprings at 
epochs of 7r, and each descendant follows the same kind of behavior indepen- 
dently of the history and of the coexisting individuals. Labeling individuals 
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in the genealogical order by integer sequences w = j±, . . . ,j n , let t w be the 
birth epoch of the generic individual. The CMJ process is defined as [13] 

= Xw(t — T w ), 
w 

which is the sum of characteristics of individuals born before t. 

To represent the configuration of crumbs as a CMJ process we set t w = 
— log^,,, for crumb labeled w and we define the characteristic 

Xw{t) = #{j : - log(r] wj /Cvj) < t} 

to encode the configuration of solids produced by the crumb. It follows 
easily from the definitions that = N e -t. A key point is to apply [19], 
Theorem 5.4. 



Lemma 3. As t 



oo 



M- 



a.s. 



for M the terminal value of the intrinsic martingale. 

Proof. Translated in our terms, Conditions 5.1 and 5.2 from [19] re- 
quire existence of integrable, bounded, nonincreasing positive functions h\ 
and h>2 such that 



E 



1 

t hi(t) 



< oo 



and 



E 



sup 

t 



< oo. 



These two inequalities follow from the Malthusian hypothesis with h\(t) 



h 2 (t) 
that 

(11) 



e et for sufficiently small e > 0. Applying [19], Theorem 5.4, we see 



Jo 



-a»t 



E[ X (t)]dt 



a.s. 



f£°ue- a * u (i(du) 

where is the terminal value of some martingale (different from the intrinsic 
martingale) and fj, is the intensity measure of r, that is, the image of measure 
a via mapping ih - logx. Since E[x(£)] = f l(x > e~ t )v(dx) by definition 
of intensity v, the numerator in the r.h.s. of (11) is 



E\x(t)]dt= / y 



,a*-l 



1 x a. 



o a* 



u(dx) 



l(x > y)v{dx) dy 
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by Pubini's theorem which is applicable due to the Malthusian hypothesis. 
Changing the variable in the numerator of the r.h.s. of (11) we see that it is 
equal to —ip'(a*). Hence 



N- 



a.s. 



We can also apply the same result to a CMJ branching process with dif- 
ferent characteristic x'(t) = l(i > 0), which counts individuals born before t. 
Condition 5.1 remains the same and Condition 5.2 becomes sup t e~ a * t /h,2(t) < 
oo, so we can take h,2(t) = e -Q **/ 2 . Thus the Malthusian hypothesis implies 
that 



(12) 



-a*t rrX 



N- 



a.s. 



with the same N as above. 

Biggins [4] derived similar asymptotics for Zf in terms of branching ran- 
dom walks. From [4], Theorem B and the Malthusian hypothesis 

1 



1 r T 

T Jo 1 -i>> Q* 



a.s. 



as T — > oo, where M is the terminal value of the intrinsic martingale. Inte- 
gration by parts and comparison with (12) show that N = M a.s. □ 

Translating the lemma back in terms of the sizes of solids we have: 



Corollary 4. As x [ 

x a *N x 



M a.s. 



Next, combining this corollary with Lemma 2 gives our principal asymp- 
totic result which complements Theorem 1. 



Theorem 5. If assumption (1) and the Malthusian hypothesis both hold, 
then 



y?(a H .)r(-a < .) 
<£>(a*)r(r - a*) 



-ip' (a^rl 



M as n —>■ oo, 
M as oo, 



almost surely. 



10 



A. V. GNEDIN AND Y. YAKUBOVICH 



5. Moments of M. Formulas for moments of the terminal value of the 
intrinsic martingale involve expectations of some symmetric functions in 
the variables (Xi). For each integer vector A = (Ai, . . . , A^) with components 
Ai > • • • > Xe > let 



m(A) =E 



E E xj*°-...xg- 

(m,...,fiij ii<—<i£ 



where the external sum expands over all distinct permutations (//i, . . . 
of the entries of (Ai, . . . , A^), and the internal sum expands over all increasing 
^-tuples of labels of (Xi). We assume for the rest of the paper that these 
moments exist for all integer vectors A; this is always the case if the number 
of positive X^s does not exceed some constant, since Xi < 1 for all i. 

Let afc = E [M k ] (k = 0, 1, . . .) be the moments of the terminal value M 
of the intrinsic martingale (they all are finite, see [12], Proposition 4). In 
principle, the moments can be determined recursively from the following 
lemma. 

Lemma 6. Under assumption (1), the Malthusian hypothesis and finite- 
ness of moments m(X), the moments satisfy the recursion 

(13) "* = TT|^|>Mn^T Jbrt-2,3,... 

A#(fc) 

where the initial values are oq = a± = 1 and the summation is over all non- 
increasing positive integer sequences A = (Ai, . . . , A^) with Ai + • • • + Af = k 
and £ > 1 . 

Proof. Take the kth power in (3) and expand the r.h.s. by the multi- 
nomial formula. Collecting all terms containing a^ to the left side yields the 
recursion. □ 



6. Dirichlet splittings. 



6.1. Bessel bridges. It is known that the Ewens-Pitman (a, a) partition 
structure (0 < a < 1) can be induced by a paintbox P whose components are 
the lengths of excursions from of a Bessel bridge (Bt, t £ [0, 1]) of dimension 
2 — 2a [7, 21, 23]. A possible recursive construction of P is the following. 
For each t G [0, 1] define Q t = sup{s < t : B s = 0} and V t = inf{s > t : B s = 0}. 
Choose a random point T from some distribution on ]0,1[, independently 
of (B t ). The bridge (B t ) decomposes into three components according as 
< t < Qt (bridge) , Qt <t < T>t (excursion) or T>t <t<l (bridge) . Given 
Qt and T>t, the components are conditionally independent and the first and 
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the third components are the scaled copies of (Bt). It follows that the iterated 
division in three intervals (bridge-excursion-bridge) yields the same P as the 

recursive construction directed by {X\ , Y2, X3) = (Qt-,T>t — QtA~ T^t) with 
arbitrary distribution for T. 

In particular, assuming T = U for U uniform [0, 1] , the law of {X\ , Y2, X3) 
is Dirichlet with parameters (a, 1 — a, a), because this is the law of {Qu,T>u — 
Qui 1 — Qu)i as in [21]. Computing 



(3 + a r(l - a)T(J3 + 1 + a) 

we see that in this case the Malthusian exponent is a* = a. Applying The- 
orem 5 we obtain the asymptotics 

which is the "a-diversity" of V previously shown in [10, 22, 23] by different 
methods. The variable M has moments 

mf <n- r(a)r( g + i) 

WU J [r(a)A , (2a)]«r(( g + l)a)' q> 

and its distribution is a size-biased version of the Mittag-Lemer distribution; 
sec [21]. 

Choosing any other distribution for T (e.g., T = 1/2 a.s.) will result in 
different distribution for {X\, 12,^3), although, by the special self-similarity 
properties of this (a, a) case, the law of P (up to arrangement of terms) will 
not alter. 

We recall the original construction of the Poisson-Dirichlet paintbox from [20]; 
see also [22]. Let < a < 1 and 9 > —a. Take (Wj) to be a sequence of in- 
dependent random variables where Wj has beta(# + ja, 1 — a) distribution. 
Then the Poisson-Dirichlet (a, 9) paintbox can be composed of the terms 
P j = W 1 ---Wj- 1 (l-Wj). 

6.2. Other tripartite Dirichlet splittings. To extend the above Bessel 
bridge model assume that the triple (Xi, 12,-^3) has a Dirichlet distribution 
with parameters (7,/?, 7), where /3, 7 > 0. In this case the intensity measure 
a has a density which is beta(7, f3 + 7) multiplied by 2, and v is beta(/3, 27). 
Their Mellin transforms are 

o r(/? + 27)r(q + 7 ) r(/? + 2 7 )r(a + /?) 

${a)=2 — ^ R ^„ > , r and a) = v 

r(a + /? + 27jr(7) r(/^)r(a + p + 27) 

The recursion for moments = E[M fc ] in Lemma 6 specializes as 

hA , \rfn\ r(/j + 2 7 )r(fcq» + 7 )r((n - k) a * + 7 ) >9 

(14) a n = 2^ , afcfln-fc =r, , o , o \ r ^2 ' n - 2 

^ \ k J r(na* + /3 + 2 7 )r(7) i 
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with the initial values do = cti = 1. 

In the case when r = (3 + 7 is integer, ip is a rational function, and (2) is 
actually a polynomial equation in a of degree r. The case r = 1 covers the 
Bessel bridge instance of the previous section. For r = 1, 2, . . . simplification 
is possible by introducing variables 

(15) K = r( 7 )n! Qn 
for which the recursion (14) becomes 

(16) }>b k b n ^ k = — '-b n , n>2. 

Note that the same formulae also hold for n = 0, 1 since &o = 1 an d in view 
of (2). This allows us to characterize the generating function 

00 

h(y):=Y,hy k 
k=0 

as a solution to the differential equation 

(17) ^1-7^.(^(^)^-1) = ( 7 ) rt fc(^)3. 

In the variable y = z a * this equation is a nonlinear differential equation with 
polynomial coefficients. 

For instance, when r = 2, (2) becomes (a* + 7) (a* + 7 + 1) = 27(7 + 1) 
and after some manipulations we obtain 

«*y 2 />"(y)/(7(7 + 1)) + yh'(y) + %) = ^ 2 (y) . 

We did not succeed in solving the equation in terms of some known special 
functions for r > 2. We can, nevertheless, show that this partition structure 
is of novel type: 

Lemma 7. For no r = 2,3, . . . and no 7 £ ]0,r[ <ioes £/ie recursive par- 
tition structure obtained by the recursive tripartite Dirichlet splitting with 
parameters (7,?" — 7,7) belong to the Ewens-Pitman two-parameter family 
of partition structures. 

Proof. The statement follows by computing probabilities p(n) for n 
balls painted the same color. Indeed, in the (a, 0)-model this probability 
is [22] 

(l-a)(„_i )T 
Pa,e( n ) = 77— m » 

(1 + 0)( n -l)T 
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and in our model it is 



p(n) 



{r + l) n ^ -2(7)„ T ' 



Assuming the coincidence for some value of parameters (a, 8) we must have 
a = a* and p{n) =p a: g(n) for all n. Analyzing the behavior of these proba- 
bilities as n — ► oo we find out that 

£(1 + 0) _ Q „ e r(r + 7 ) _ 2 

^ (n) " r(i-a) n and p(n) ~ r(r- 7 ) n ' 

whence 6 = 2 7 — a. Substituting this value in the equation p{2) = p a ,e(2) we 

2 

see that a = 7 — §^5^ • Comparing again the rates of decrease of p a fi (n) and 
p(n) for these particular values of a, 6 we get 

T(l + (r 2 - r)/(2r - 7) + 7) _ T(r + 7) 
T(l + (r 2 - r)/(2r - 7) - 7) ~~ T(r - 7) ' 

Since the r.h.s. increases as a function of r > 7 and the l.h.s. is the same 

2 

function evaluated at a different point, necessarily 1 + ^ r ~^ = r. But this 
can happen only for r = 1 or r = 7. In the latter case the parameter 7 = r 
is not admissible, so the coincidence happens only for r = 1. □ 

6.3. Multiple splittings and (a,a/d) partitions. Now suppose the split- 
ting procedure produces d+1 crumbs (d > 1) and one solid at each step. 
Suppose the joint distribution of the crumb sizes and the solid size relative to 
their parent crumb is the Dirichlet distribution with parameters (7, . . . , 7, (3) 
where 7's correspond to d+ 1 crumbs and (3 corresponds to the solid. Mellin 
transforms of the intensity measures are 

^(a) = (ci + l) r(a + /3+(d+ih)r(7) 

and 

r(/? + (rf + i) 7 )r(a + /?) 
na) r(/3)r(a + /3 + (d + i) 7 ) ' 

The d = 1 case was considered in the preceding section. Similarly to the 
above, explicit computations are only possible when (3 + d'y = r is integer. 

The simplest case r = 1 leads to some exactly solvable recursion for mo- 
ments of the terminal value M of the intrinsic martingale. We consider this 
case in more detail. For r = 1, (2) becomes (d + 1)7/(0 + 7) = 1 with the 
solution a* = d'y. The recursion for moments a n of M is easier to write down 
in new variables b n defined by (15): 

(18) ]T& Al ...& Ad+1 = (nd + l)& n , n>2, 



14 



A. V. GNEDIN AND Y. YAKUBOVICH 



where the sum is taken over all nonnegative integer vectors (Ai, . . . , A^+i) 
with Sj-Aj = n. This leads to the differential equation dyh'(y) + h(y) = 
h(y) d+l for the generating function h(y) = J2ri=o b n y n - Solving this equation 
we obtain 

mMq] = d«r(a,+a,/d)«r(l/d + q) 
1 1 r(a*/d)*- l T(qa* + a*/d)T(l/d)' 

We recognize these as the moments of the limit distribution of '^^(a'Jd)'^ n ~ a * Kn , 
where K n is the number of blocks in the Ewens-Pitman partition structure 
with parameters (a*,a*/d) [20, 22] restricted to the first n balls. The limit 
has density proportional to x l / d f at (x), where f a is the density of the Mittag- 
Leffler distribution with parameter a. The following proposition shows that 
the partition structures coincide. 

Proposition 8. The exchangeable partition obtained by a splitting scheme 
with d + 1 crumbs and one solid whose joint distribution is Dirichlet 
(a/d, . . . , a/d, 1 — a) (0 < a < 1) coincides with the Ewens-Pitman (a, a/ d) 

S * ' 

d+1 

partition. 

In the proof we use a mapping q which sends a collection B of kd + 1 
elements with unit sum to a random collection of l)d+ 1 elements with 
unit sum. This mapping is defined for a €]0, 1[ as follows: 

(1) choose an element from the collection B by a size-biased pick; 

(2) replace the chosen element Z by d + 2 elements (YZ, X±Z, . . . , X^+iZ) 
where (Y, X\, . . . , X^+i) is an independent of B Dirichlet (1 — a, a/d, . . . ,a/d) 
random vector; 

(3) remove the element YZ from the collection, divide all elements by 
1 — YZ so that they sum to 1, and let q(B) be the rescaled collection. 

Lemma 9. Let B be a collection of kd + 1 random variables whose 
joint distribution is Dirichlet (a/d, a/d). Then q(B) is a collection of 
(k + \)d+l random variables with joint distribution Dirichlet (a/d, . . . , a/d). 
Moreover, the size of the discarded element is a beta(l — a, (k + 1 + l/d)a) 
random variable independent ofq(B). 

Proof. After the first step, the conditional distribution of elements in 
B given that the size-biased pick has index % is the Dirichlet distribution 
with one parameter a/d + 1 for element i and other parameters a/d. We 
relabel the elements so that the chosen element is the first one. After the 
second step, the elements in the collection have the Dirichlet distribution 
with the first parameter 1 — a and other (k + l)d + 1 parameters a/d. This 
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can be easily verified by a moment calculation, exploiting the independence 
of (Y,X±, . . . ,X(i+i) and B and the fact that the Dirichlet parameters of 
(Y, X\, . . . , Xd+i) sum to the Dirichlet parameter of the replaced element. 
The statement of the lemma now follows from [14], Chapter 40. □ 

Proof of Proposition 8. The Poisson-Dirichlet paintbox can be ar- 
ranged in a sequence by the stick-breaking procedure described in the end 
of Section 6.1. We show that the terms of the paintbox in our model can 
be also arranged in such sequence, as follows. Since each crumb produces 
exactly one solid in the model in focus, there is a one-to-one correspondence 
between solids and their parent crumbs. Let the first solid r/(l) in the ar- 
rangement be a child of the progenitor crumb £ and let Ax = {£i, . . . 
be the offspring crumbs of £ . Inductively, at time k let the first k solids 
have been arranged as 77(1), . . . , Tj(k) and let A^ be some collection of crumbs. 
The next solid to be added to the sequence is chosen in the following way. 
Select a crumb £ w by a size-biased pick from all crumbs in the collection A^ , 
let the next element r](k + 1) added to P be the solid child of this £ w , and 
further replace £ w in the collection A^ by the offspring crumbs of £ w , thus 
constructing A k+1 := U . . . ,&u,(d+l)}- Proceed by induction 

to arrange all solids in sequence. 

Now let us check that the sequence of solids P = has the same dis- 

tribution as the lengths produced by the stick-breaking procedure described 
at the end of Section 6.1. At the first step, the law of 77(1) is the marginal 
distribution of the Dirichlet distribution which is beta(l — a, (1 + l/d)a). 
For k = 1,2,... introduce the scaled collections of crumbs 



Bfe Hnb : £ GAfe f' where = E £ = 1 - r ?( 1 ) 



Then Bx has Dirichlet (a/d, a/d) distribution by [14], Chapter 40, and 
.Bfc+i = q(Bk) for all k. Using Lemma 9, we check by induction that Bk is 
a collection of dk + 1 elements whose joint distribution is Dirichlet with all 
parameters a/d, and rj(k) has beta(l — a, (k + l/d)a) distribution and is 
independent of 77(1), . . . , rj(k — 1) for all k. Taking Wt = 1 — r](k) yields the 
desired decomposition. □ 

7. Further subdivision of solids. Suppose we have some recursive paint- 
box construction with the Mellin transforms ipo and (po of the intensity 
measures for X and Y. A refined paintbox construction can be produced by 
a further independent subdivision of each solid according to some sequence 
P = (Yfc) of nonnegative random variables with = 1. This is equivalent 
to replacing (Yj) in the original construction by an array (YjY^) (arranged 
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in a sequence). By independence, the Mellin transform of a new intensity 
measure for (YjY^) in the refined process is the product 



<p(a) = <p (a)<p(a), <p(a)=M 



En 



If the expected number of nonzero Y^s is finite, then this new construction 
satisfies the Malthusian hypothesis once the original construction satisfied 
it. If an infinite number of positive Y^'s is possible, we should also require 
(pifit* — e) < oo to keep with the Malthusian hypothesis. 

One example where a similar additional subdivision of solids was used is 
a representation of the Poisson-Dirichlet (a, 6) paintbox [although it does 
not fit exactly in our scheme since the expected number of nonzero AVs 
is 1, violating (1)]. The recipe is the following [7, 9, 24]: divide the unit 
interval by points of a stick-breaking process with W, i.i.d. beta(#, 1) and 
then organize on each subinterval an independent subdivision by zeroes of 
a Bessel process of dimension 2 — 2a. Here a* = [due to a violation of (1)] 
and (p{a*) = oo. 

When the Malthusian hypothesis still holds for the refined process, it has 
some common features with the original one. For instance, the Malthusian 
exponent remains the same, and the limit of n~ a *K n changes only by a 
constant factor ^(a*). However, other characteristics of the paintbox, such 
as probabilities p(n) that n balls are painted in the same color, change 
significantly once any subdivision is made. 
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